24 research outputs found

    A Critical Comparative Assessment of Predictions of Protein-Binding Sites for Biologically Relevant Organic Compounds

    Get PDF
    SummaryProtein function annotation and rational drug discovery rely on the knowledge of binding sites for small organic compounds, and yet the quality of existing binding site predictors was never systematically evaluated. We assess predictions of ten representative geometry-, energy-, threading-, and consensus-based methods on a new benchmark data set that considers apo and holo protein structures with multiple binding sites for biologically relevant ligands. Statistical tests show that threading-based Findsite outperforms other predictors when its templates have high similarity with the input protein. However, Findsite is equivalent or inferior to some geometry-, energy-, and consensus-based methods when the similarity is lower. We demonstrate that geometry-, energy-, and consensus-based predictors benefit from the usage of holo structures and that the top four methods, Findsite, Q-SiteFinder, ConCavity, and MetaPocket, perform better for larger binding sites. Predictions from these four methods are complementary, and our simple meta-predictor improves over the best single predictor

    A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome

    Get PDF
    This article is made available for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.Intrinsic disorder (i.e., lack of a unique 3-D structure) is a common phenomenon, and many biologically active proteins are disordered as a whole, or contain long disordered regions. These intrinsically disordered proteins/regions constitute a significant part of all proteomes, and their functional repertoire is complementary to functions of ordered proteins. In fact, intrinsic disorder represents an important driving force for many specific functions. An illustrative example of such disorder-centric functional class is RNA-binding proteins. In this study, we present the results of comprehensive bioinformatics analyses of the abundance and roles of intrinsic disorder in 3,411 ribosomal proteins from 32 species. We show that many ribosomal proteins are intrinsically disordered or hybrid proteins that contain ordered and disordered domains. Predicted globular domains of many ribosomal proteins contain noticeable regions of intrinsic disorder. We also show that disorder in ribosomal proteins has different characteristics compared to other proteins that interact with RNA and DNA including overall abundance, evolutionary conservation, and involvement in protein–protein interactions. Furthermore, intrinsic disorder is not only abundant in the ribosomal proteins, but we demonstrate that it is absolutely necessary for their various functions

    In-silico prediction of disorder content using hybrid sequence representation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content.</p> <p>Results</p> <p>We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content.</p> <p>Conclusions</p> <p>DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at <url>http://biomine.ece.ualberta.ca/DisCon/</url>.</p

    MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins

    Get PDF
    Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains

    Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.</p> <p>Results</p> <p>The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.</p> <p>Conclusions</p> <p>The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

    Prediction of Intrinsic Disorder in Proteins Using MFDp2

    No full text
    Intrinsically disordered proteins (IDPs) are either entirely disordered or contain disordered regions in their native state. IDPs were found to be abundant across all kingdoms of life, particularly in eukaryotes, and are implicated in numerous cellular processes. Experimental annotation of disorder lags behind the rapidly growing sizes of the protein databases and thus computational methods are used to close this gap and to investigate the disorder. MFDp2 is a novel webserver for accurate sequence-based prediction of protein disorder which also outputs well-described sequence-derived information that allows profiling the predicted disorder. We conveniently visualize sequence conservation, predicted secondary structure, relative solvent accessibility, and alignments to chains with annotated disorder. The webserver allows predictions for multiple proteins at the same time, includes help pages and tutorial, and the results can be downloaded as text-based (parsable) file. MFDp2 is freely available at http://biomine.ece.ualberta.ca/MFDp2/

    Protein Intrinsic Disorder as a Flexible Armor and a Weapon of HIV-1

    No full text
    Many proteins and protein regions are disordered in their native, biologically active states. These proteins/regions are abundant in different organisms and carry out important biological functions that complement the functional repertoire of ordered proteins. Viruses, with their highly compact genomes, small proteomes, and high adaptability for fast change in their biological and physical environment utilize many of the advantages of intrinsic disorder. In fact, viral proteins are generally rich in intrinsic disorder, and intrinsically disordered regions are commonly used by viruses to invade the host organisms, to hijack various host systems, and to help viruses in accommodation to their hostile habitats and to manage their economic usage of genetic material. In this review, we focus on the structural peculiarities of HIV-1 proteins, on the abundance of intrinsic disorder in viral proteins, and on the role of intrinsic disorder in their functions

    More Than Just Tails: Intrinsic Disorder in Histone Proteins

    No full text
    Many biologically active proteins are disordered as a whole, or contain long disordered regions. These intrinsically disordered proteins/regions are very common in nature, abundantly found in all organisms, where they carry out important biological functions. The functions of these proteins complement the functional repertoire of normal ordered proteins, and many protein functional classes are heavily dependent on intrinsic disorder. Among these disorder-centric functions are interactions with nucleic acids and protein complex assembly. In this study, we present the results of comprehensive bioinformatics analyses of the abundance and roles of intrinsic disorder in 2007 histones from 746 species. We show that all the members of the histone family are intrinsically disordered proteins. Furthermore, intrinsic disorder is not only abundant in histones, but is absolutely necessary for various histone functions, starting from heterodimerization to formation of higher order oligomers, to interactions with DNA and other proteins, and to posttranslational modifications
    corecore